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If Concurrent GC Cannot Keep Up 
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Testing GCs Criticality 



Run concurrent collector on LITTLE vs. BIG and 
measure the difference in execution time 


GC Criticality 



Run concurrent collector on LITTLE vs. BIG and 
measure the difference in execution time 
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Our Adaptive Scheduler 


Measures GC 
criticality during 
runtime 



Communicates from 
JVM down to 
scheduler 
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Operal Ing System 


Scheduler 
dynamically adapts 
big core cycles 
given to GC 



Other Schedulers 


Baseline = "gc-on-small" 
gc-fair 

- All threads equal time on the big core(s) 

- Round-robin 

- Based on K. Van Craeynest, S. Akram, W. Heirman, A. Jaleel, 
and L. Eeckhout. Fairness-aware Scheduling on Single-ISA 
Heterogeneous Multi-cores. In PACT 2013. 



Our GC-criticality-aware Scheduler 

sampling interval 
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State 

How many ms scheduled on the BIG core 

PO 

First GC thread = 1 ms, Second GC thread = 1 ms 

p 1 

First GC thread = 1 ms, Second GC thread = 2 ms 



Sampling interval is a variable parameter (different from scheduling quantum) 
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Concurrent garbage collector 
2 GC threads 



• BIG core is 000 

• Small core is in-order 

• 3-level cache hierarchy 

• We vary the small core/ 

• We vary the # BIG cores 

• We vary the total# cores 



4 single-threaded and 5 multithreaded 
We vary the number of threads (2-4) 


Experimental Results 


Lots of heterogeneous architectures 

- We will show 3B1S 

Baseline = gc-on-small 

Our adaptive scheduler, varying 

- Sampling interval 

- I max 



% execution time reduction 


Performance of GC-criticality-aware 

Scheduler 


■gc-fair adaptive (Ts=50ms, lmax=4) adaptive (Ts=100 ms, lmax=8) 
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% reduction in EDP 


Energy Efficiency of 
GC-criticality-aware Scheduler 
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% execution time reduction 


Performance: Small Core Slower 


■ Small core at 2.66 GHz Small core at 1 .66 GHz 
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Conclusions 


Concurrent garbage collection benefits/'' - ''^ 
out-of-order execution ( BIG ) 

Multi-threaded applications exhibit GC- 
criticality stw 

Our GC-criticality-aware scheduler 
dynamically gives GC more big core time 


based on information from 

Virtual r iyiachine 

- Improves performance and energy j ! 

for GC-critical applications 

Operating System 
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Sources of Inaccuracy in M+CRIT 
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Sources of Inaccuracy in M+CRIT 
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Our Contribution 
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Our Contribution 


DEP+BURST 

A New DVFS Performance Predictor 
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Managed Multithreaded Applications 
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Conclusions 


• DEP+BURST: First predictor that accounts for 

- Application and service threads 

- Synchronization -> inter-thread dependencies 

- Store bursts 

• High accuracy 

- Less than 10% estimation error for Java benchmarks 

• Negligible hardware cost 

• Demonstrated energy savings 

- 20 % avg. for a 10% slowdown (mem-intensive Java apps.) 
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